DOCDBBIT .BESUHE 



BD 164 593 

ftUTHOB 
TITLE 

IHSTITOTIOH 

SPOHS AGENCY 
fUB DATE 
GBIHT 
BOTE 

aVail&ble frok 



EDBS PRICE 
J>BSCRIPTORS 



IDENTIFIERS 



. ' TH 008 113 

ScrivjeBr Michael ■a,. ^ " • 

Evaluation Bias and Its control. Paper #4 in . 
Occasional Piper Series. • 
Western Michigan Oniv., Kalamazoo. School ot 

Education.' l • ^ 

National Science Foundation, Washington^ d.c. 

Jun 15 . ' 

GW-7903 

lip 

Bary Anne Bunda, The Evaluation center. College of 
Education, Western Michigan Oh^versity, Kalamazoo, 
Michigan U9008 ($2.00) 

MF-$0.83 Plus Postage. HC Not Available from EDRS. 
Administrative Problems; *Bias; *Credibility ; 
♦Evaluation Methods; Evaluation Needs; Evaluators; 
Models; *Program Evaluation; Summative Evaluation; 

♦Validity - , ^. „ . 

♦Evaluation Problems; ♦^xternal Evaluation; Meta 

Evaluation 



ABSTRACT * ' Selected aspects of the problem of obtaining unbiased 
program or ptoduct evaluation are discussed. An evaluator who is a 
.ember of the project staff will have difficulty producing ..an 
SvSluat?on which ^s- credible and .valid. Project monitors will alsQ 
have a problem since.they* are often required ^ssume^the ■ 
conflicting roles of external evaluator and pf oject advocate. 
Therefore, no unit should rely entirely on a given subunit for 
evaluative feedback about that same subunit. Evaxuative_ feedback 
systems require renewal or replacement to prevent deterioration of 
their independence. Eva4.uators should arrange for replication of 
thiir own Sork by independent evaluators. Four further approaches for- 
reducing bias in evaluation include: (1) standardizing ^^^J 
aSalitative aspects of evaluation procedures by using a checklist, 
(2) ip^ding evaluator training procedures; (3) reducing sources of 
bias external to the evaluator; and (£*) comparing the project, 
programs, or products with alternatives. (SDM) 



^♦♦♦^♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦****************************^ 

* . Reproductions supplied by EDRS.are-Ffie best that can be made ♦ 

* from the original document. 

^♦♦♦♦♦♦♦♦♦♦♦^♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦*****************************r*******^ 

o • : . • .. ■ . . ^ - 

FRIC 



OS 



Evaluation Bias and .Its Control 



by 



Michael Scriven 
University of Qalifornia, Berkeley 



June, 1975 



us OCPAHTMENT OF HEALTH. 
EDUCAtlON 4WELFARE 
• NATIONAL INSTITUTE OF 
EDUCATION 

\miS document hAS> BEfcN REPRO- 
DUCED EXACTLY AS RFCEIVED ^ROM 
ThF PFPbON 0« ORtiANl?ATlON OWIGIN- 
ATiNCiT POINT^OF VirW OR OPINIONS 
STATED DO NOT NECfSSABILY REPRE- 
SFNT Of F ICIAL NATIONAL INSHTUTF OF" 
FDUCATION POSITION 0« POLICY 




CO 



GO 
O 



COLLEGE OF EDUCATION 
WESTERN MICHIGAN UNIVERSITY 
KALAf^AZOO, MICHIGAN 49008 



Paper #4 
in 

Occasional Paper "Series 



•PERMISSION TO REPRODUCE THIS 
MATERIAL IN MICROFICHE ^NLY 



HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERICI AND 
USERS OF THE ERIC SYSTEM." 



ERIC 



Evaluation Bias and Its Control* 



by 



Michael Scriven 
University of California, Berkeley 



June, 1975 



*A paper originally written in conjunctinn with^the Evaluation 
"Center's Grt^int # GW - 7903 'from the National ^Science Foundation 



Introduction 

In^is paper I shall consider certain aspects- of tf[e problem 
of obtaining unbiased information about the merits 0I; a prograim or 
product, whether for. purposes of-deciston making or for accountability 
•The evaluation of personnel, as well as the evaluation of proposals 
and evaluations, generally involves a different set of problems than . 
•those which I will consider here. .However, some points made here 
wijl apply: Throughout, 'efforts ^re made to consider both the ere- . 
dibility. and the validity of an evaluation , the former being (roughjy) 
the audience's estimate of the latter. ■', 

' Since the audience's estimate is sometimes affected by consi- 
derations that are, as it happens, irrelevant in a particular case; 
and since t!-.e function of an evaluation is sometimes- in part to pro- 
vide credibility and not just vJslidity, evaluation desigh must some- 
tim.es involve considerations that go beyond validity. This must not 
be viewed as 'pandering to prejudice, but as of the essence of cer- 
c'.fication, of accountability, in a more general sense of the educa- 
tional and social obligations of the evaluation. ("It is not 
enough that justice be done, it must also be the case that it be 
seen that justice is done.") 

Let us bagin by looking at some typical important practical 
cases of bias in program evaluation. 

*^ . 

:. )ivided Loyalty and the Co-option of Staff Evaluation 

The simplest instance of bias in program evaluation is the case 
of the evaluator who is part of the program staff and loses objec- 
tivity' because of social and economic bonds to the development staff. 



compounded by .the cumulative effect of repeated acceptance (or ' 
rejection) of evaluative suggestions . The resulting situation of * 
quasi co-authorship (or frustrated co-authorship) naturally des- 
troys the- external credibility of the evaluation and often the 
validity of ^ the evaluative judgments: ^, 

• The remedy ts to add external e\^aT^ator5. Being short-term 
consultants, thes^do'no.t or cannot replace the staff ^valuators 
for day-to-day purposes^ (If they are not' short-tanners, they 
rapidly become quasi-staff.) P&ceb with these visitors , the staff 
evaiuator often exhibits considerabUe ambivalence.. Professional j 
bonds 'Struggle with work-mate bonds, with rather .erratic results. 

Another approach is possible within fairly large organizations, 
such as states, -most school di.stricts, and R & D, units. This in- . 
volves the systematic rotation of evaluation staff frcm-project to 
project so as to avoid the effects of excessive loyalty or hostility. 
This is. sometimes complicated by the need for specia'l expertise, 
(e.g., in math curricula), but the excuse is produced more often^ • 
than !t deserves. Rotation is usually possible, and' nearly always 
desirable for much the<*feame reasons -as in the diplomatic and^ armed 
' services. It should be imposed by managemeint as part of the i|Jisci- 
pline of the job,, the requirjfements "of good performanfie, in much the 
same way as inservice updating should be required of -staff physrtians 
in a clinic. . . * ^ 

Divided Loyalty ancl Project Monitoring J ^ • 

fhe prpject monitor from the funding agency fa^?^s related prob- 
t ' ■ , ■' 

lems but in a different dontext. While visiting the project, he or 



she is seen as an external eva'luator, but ba9k in the capitol, a 
switch in role is often required (or naturally adopted) to that of. 

project advocate. 

One. recently espoused remedy is to segriegate the monitor'ing 
function entirely, oossibly through -subcontracting the evaluation-, 
to rechris'ten the liaison person with a tftle sgch^as'progriam'offi- 
cer or associate, and thus wholly legitimate the advocacy role at 
the agency. Another solution is to interchange the roles just des- 
cribed, that is, have the monitor carry out the evaluation and have 
the prtfject appoint ? resident advocate representative in Washington, 
or have someonl on the staff who could go' there at a moment's notice,. 
Th6 big contractors, 'of course, adopt this alternative^ 

Now, using such ciftes and proto-solutions a^ a-springboard, c^n 
we begin to see th6 outlines of some general approaches? , 

Organizational Bias Control ,1 

■ . .. ■ V'x ■ ' ' 

The first great step^towards accountability (or just towards 
decent work) consisted in requiring tliat there be some evaluation of 
tax-funded or foundaii on-funded projects. At first, tliis meant no ^ 
more than rechristening the final report. In any case it amounted 
to requesting Jones to be sure to tell the agency whethter she or he 
had done a good job. This is obviously not likely to produce un- 
biased feedback, but it .is less obvious that there are two sources 

• . •. ■ ■ ' - t . 

of bias in the situation. The agency has made a grant, so it is. in 

a parental role,. and the success or failure of the grant is partly 

an e\?alual7fon of the agency itself. Organizationally, the situatign 

c^n .be represented in terms of a t^bli or diagram as shown below. 



just, been dis.cussii 



Figure I Illustrates the situation we. have just, been dis.cussing 
(letters 'represent actions, such as requests, payment, orders, and 
support; numbers represent e>/al*uative feedback). We notice that 
there .is a closed circuit in the evaluation of the project,* begin- 
ning with A-| , and concluding with 3.2, i.e., the money goes. to»the 
organization that evaluates its use.* Similarly, project management 
sends money down A2 and gets evaluation feedback via 1, 2, and 3.1. 



1 



Funding Agency /|\ 

Ai 3.2 ■ 
Project Management i . ^ 

Ae:t ^^l B t 3 ic t 3.1 

Production Prodess & Staff 



D 

^ New Projects 



A - .Initiation steps, *staff appointments, material>^ orders , etc. 

Ai = Initiating actions by Funding Agency . ^ ^ ' 

Ag = Initiating actions by Project Nlanag^ent • • . ' 

B,C 5= Developmental' decisions, staff replacement, etc. 

D = Agency decisions about future funding 

1 = Early fofmative evaluation feedback to service- decisions B 

\ : J, 

2 ^ = Later^feedback to service decisions C 

3.1 ■ = Final summary of jJata' to service 3.2 ' ^ 

. 3.2 =\Summative evaluation of project , can -be regarded as formativ 
'by the foundation^ with respect tojts ongoing .activity , i.e. 
as-^serving D > *. . * 

t = Time V 

*^ Figurje I 

Cycles of Decisions an^l Feedback: Internal Evaluation Only^ 



These circles are risky, thoutih not wholly avoidable, the trick 



Is to provide some procedure that eliminates the complete^ dependence 
of management- on a single feedback circuit. .The feedback systems of 
f^igure I are infierently biased positively, and we have to introduce a > 
circuit with a balancing bias, or some reality constraint" that cuts 
across a circle, ifiis can be represented as "in Tigure II. 'The ' • 
sample circle is brokert by the'intrusion of fiel'd trial results. 
Notice it is. broken only if that data goes strai-ght to project. manage- 
ment.. If it goes only to the production staff, it does not cut the / ■ 
circ^l^ (actually, an. overlapp'infci series of ci regies) of formative 
evaluation. And at best it onjly^ cuts the. formative circle, -not .the 
surraiiati ve one. 



Project Managettient — 



1 



B 



Production Staff 



ResuUs of Field Trials 



= Initiating actions (the content of A)|. by Project Management 
B,C = Developmental decisions, staff replacement", etc. 

1 = Early formative evaluation feedback to service decisions B 

2 - Later feedback, to service decisions C ^ • 
2* = Formative feedback from field trials provided to Project Management 
t ? Time ' . • n • 

Figure II 

• - "^-^ 

Cycles of Deci^sions and Feedback with Internal Evaluation 
- \ Plus Field Trials . * 



It Is- impossibly to understand .the persistence of the Incestuoij^ 
pattern of Figure I w1,th Its "tendency "towards optimistic bias unless 



•one realizes, that both parties involved' have a motive for'continuing 
1t. The agency wants >avo.rable feedback' about its actions, and the 
project wants, the agency to think well of it (and wants to think welT 
of Itself). So, of course; the stable situation is one of. highly 
favorable evaluation'. )n "the technol ogical area, there is real»1ty- . 
feedback- to thfe, agency later (breaking the potentially vicious 
summative cifcle) from marketing or, medical data, which keeps the sys- 
tem honest. But in education' and armaments, though such feedback is 
possible in principle, it is all to often transmitted through and 
hence open to (possib^ unconscious) corruption by the respons^ible 
agency: it does not break the circle. 

Aga1r>s:t.this formidable alliance, the searcft for truth is a 
little short of soldiers. If the principal value of the funding 
/ agent is maximizing the social contribution of every dollar granted, 

as of course its rhetoric and in fact"its situation fequiresy then there 
has to be an attempt to get evaluation of its projects from sources 
not quite so predisposed towards a favorable "response. 

The circle we are now talking about ,tr^g to break is a third • 
one, superimposed above the two alVeady diagrammed.. .It, begins with 
Congress, or>the taxpayer. via Congress, Ending the agency and even- 
tually receiving evaluation reports from the agency ^on its stewardship.. 
Recent' years have seen Congress increasingly sensitive to the mana- 
gerial weaknesses of that system, bringing in 0MB, GAO,' and OCA as 
indepen(ient eval'uators providing a feedbatk loop with at least less 
tendency to positive bias. 



erJc • iO 



A ■ , I 

If onl^ the reality data was readabre by the amateur, one 

« - ' ' f 

would just have to compare ^t to the reports for the evaluations to 

have a circle-breaking procedure. But tfj^it data, on big projects, 
needs computer processing, "statistical reduction and expert inter- 
pretation before its significance is apparent. Each of these steps 
Involves the pbs-sibility of distortion and the necessity of e'xper- 
t.se. If the experts used are the same ohes whose performance is 
bein^ evaluated, the "reality" line does not cut the circle. Hente 
a.more general solution inv9Tves using an independent gatherer/ 
interpreter of real ity -data, the external evaluator, 

'Perhaps we can infer preliminary forms of two onperal princi- 
ples for minimizing bias from thes^, considerations . They probably 
have'only mnemonic status, not deep theoretical significance, but 
at least they are comprehensible. The First Principl'e is the. Principle, 
of Indeg^endent Feedback, which states that no unit should rely entirely 
on a given -subiinit for evaluative feedback about that same subunit. 
Diagr^mmaticaTly we need to replace: 

Superior , ' ; 



/ 



Subdrdirrate 



with: 



A' 



Superior 




Evaluator 



Subordinate 



Th€ fact that one has \n _ independent fee<iback loop between one pair of 
levels does not satisfy the First Principle; it requires such "an arrange 



8 

ment'(not necessarily permanently installed) between every pair of ad- 
jacent levels. Here is a situation one frequently encounters: 

Superintendent 

\ I 

Project Director 

Evaluator 

Project Staff 

One young evaluator in this situation came to itie not long ago with a j 
sad but not ijnusual story. He had evaluated the project as required 
and submitted a report.* All the critical comments in the evaluation 
were then excised, his name removed, and the result forwarded to the 
superintendent by the director as a "synthesized" evaluation of the 
project. Of coupce, the responsibility for gett\^ng corrupt feedback 
like that is partly the superintendent ' s ,Vor violating ^the First 
Principle. 

The cheap way to get independent feed^ck in these situations is 
to bypass the whole chain of command with the- feedback loop and put a 
single evaluator in iti, instead of duplicating the evaluator installed 
at the lower level; or one can use the ev luator already shown in a 
double role. A device I have introduced intc Title VII evaluation 
arrangements simply requires that a duplicate copy of all communications 
from the evaluator to the project director goes up to the higher level 
(in the Title VII situation that means thie ri^oject officer at'USOE). - 
This makes the directo;; take the evaluation much more seriously in the 
formative stage,- and makes cheating (as in the case described above) 
impossible in the suimiative situation. Of course, the director can 



supplemeat, annotate, or refute the evaluator's contenttons--but must 

do so openly, not by excision. There are of course some costs In time 

arid friendship, but there are no free lunches In evaluation. 

Consider thef evaluation of teachers (or students) in the Tight 

of this principle; the same remarks apply to projects. Self-styled 

progressive schools and colleges somet"lmes go in for so-called self- 

evaluation--meaning a reflective but wholly self-geneVated report--as 

the key procedure/- While it probably has a_ place in a decent system 

« 

of evaluation, it cannot replace such a. system. A better system uses 
feedback from the students directly to the department chairperson. A 
still better system (Swarthmore, Oxbridge, Australia) uses an external . 
examiner to determine the studeats' achievement and hence the efficacy 
of tFie teacKer, a better indicator than opinion. The feedback loops 
^or the three systems differ as shown in the diagram bej^ow. 
Department Chairperson Chair 

It A, 

Teacher i Teacher 

(Self-Evaluation) Students 

- (Student Evaluation) 

. . Chair 

^ Teachep Izz:^ Evaluator 

Sti|jdents^^^ 

(External Examiner) - ' 

It is extremely difficjlt to dismiss Jt'he arguments for external 
evaluation. ■ < 

•The most general issue in this area concerns tjie decision whether- 
to' segregate the in-^iouse evaluation staff in thei"r own unit, or have 
evaluators attached to and paid by each regular unit.^ In the case of 




an educational materials development institution (R&D center- or pub- 
/ Ifsher, for example), the options look like t^is: 



Director 



Producfion Unit 1 



Writers Designers 



Production Unit 2 



Evaluation Unit 

i 



Wri ters Des 1 gners Eval uator Eval uator 

of Unit 1 of Unit 2 



Figure III 
Segregated Evaluation Model 



I 

Produ:tion Unit 1 



Director 



Production Unit 2 



Writers Designers Evaluator 

of UTift J 



Writers Designers. 



- \ 

Evaluator 
of Unit 2 



J Figure IV 

Integrated Evaluation Model 



It Is obvious that the ^'integrated" model , (Figure ^IV) violates the 
First Principle at the first "step-down," as far as the director's 
feedback is„concemed. And, provided that the about-to-be-discussed 
Second Principle is taken into account, it is my experience that the 
"segregated" plan does work better. Of course, the feedback from the 
evaluators must go to the unit managers as well as to the director of 



14 



!. ■/ ■ ' • 

.11 ^ 

the whole shop, and thsre may also be a need for internal evaluation 
staff within the units cf a large shop. , ' 

The trade-offs one must accept in using the segregated model 
sometimes involve loss of access to data or h6lp with interpreting it,, 
because the quasi -external evaluator in this model is seen as more 

M * 

alien, more of a threat than in the ir.tegrated model. A "hybrid" plan, 
similar to the one mentioned earlier, is also possible with the eval- 
uators located in units but with double reporting duties. Sometimes 
this makes more sense if the outside reporting route is via a principal 
evaluation officer on the director's staff who -schedules regular 

> 

meetings with the eval uators for purposes of development, sharing of 
problems, discussions with consultants ,v etc. 

The Second Principle, the Principle of the Instability of Indepen- 
dence, reminds one that organization charts rarely reflect much of ^re- 
ality, and, in particular, the longer they have been true, the less' 
true they are, particularly as a basis for evaluation feedback mapping. 
Independence, 'When It exists at all, is a fleeting state conspired 
against by almost all forces in a bureaucracy, and the Second Prinr 
ciple tells you- that. you have to have 'a definite program of -systematic 
renewal and replacement or your evaluative feedback system will de- 
teri orate severely, often without any sign that will be apparent in- 
teVnally. T^e simplest case of th45~tSvthe staff evaluator who gets 

co-optedi by the acceptance of his or he>). criticism or suggestions. 

■ ■ ' ' . ■• ■ . 

Looking at the- organization chart in Figure" IV from the viewpoint of 

U ' ■ ' ■ . 

the unit-imanager in an integrated setup, it appears that the .First 

, > 

Principle' has been applied. And so it has, formally speaking. There* 



12 

IS independent feedback vf the criterion for 'independence is separate 

bodies. Obviously that is neither, necessary nor sufficient. 3ut it 

is a good start because anything less lacks credibility. It-is not 

/ _ ,^ ' 

enough because friendship and enmity and ignoranc^ and rigidification 

do not show up on charts. You must have something in the system that 

will identify deterioration of independence or objectivity and pro- 

vide support or replacement where indicated. - 

The Second [Principle requires that provision must be'inade to in- , 

siire and continuailj' reinsure the independence of the evaluators. The 

informal version of this principle is: Make sure the evaluators get 

' . t 

evaluated. This suggests a worry about infinite 'regression, but np 
such necessity occurs in practice because of rapid convergence. ^To 
take a specific example, th^ Certtral Midwest Regional Lab used^to have 
{and perhaps still does have) three levels of evaluators operating on 
a hybrid model with an annual es^jternal review by a National Advisop' 
Board on ^Evaluation. The Lab's director furthermore arranged for a 
steady influx of new blood to the National Board by rotating people 
off it. Thus at the "working .level" there are. staff evaluators, 
assigned and reporti ng to projects but selected with help from and 

^ ' ' ' 

monitored by the Lab's" evaluatipn officer '"(second level), who reports 
to the lab director and is directly overseen by the National Advisory 
Board (third level). The^ly organizational weakness.that turned ^up 
in that scheme was what looked to me; (as Chairman of the National 
Board) like a long-run (three year) reductiqn in sensitivity of the 
project evaluators, and the Second Principle would suggest using t 
rotation system to avoid^this. 



• '• ' . 13 -- 

Another Worry ^oul^the Second Principle is that adding • ^ 
hierarchies -of evaluators looks like a costly business. It should, 
involve no net cost at all, usually no net cost to the organization, 
and certainly no long-run net cost to the "consumer" (taxpayers 
and/pr users), and^n*t should be designed within that constraint. (See 
"The Doctrine of Cost Free Evaluation" [Scriven, 1974, pp. 85-93].) , 
The Second Principle implies that independence requires regular veri- 
Vi cation and support and can be seen as the diachronic (thrciugh time) 
complement of .the synchronic First Principle. The Second Principle 
may lead to recommending an oscillation between two organizational 
arrangements just "because the aging of organizations leads to senility 
(after they achieve maturity). Moreover, a return to an arrangement 
that was initially inferior— on First Principle grounds— may provide con- 



dersble improvement in spite of what might appear to be decreases in 
the* independence of the, feedback. For example, after a period of heavy 
reliance on a particular external evaluator with a yery definite "line" 
about evaluation, a project may benefit from a^eriod of internal 
evaluation where the lessons 1 ear7ied~TromTn^nDirtsri der can be built '.into 
the ongoing work in ways that may bet too subtle for the short-tenn ^ 
external consultant to detect; Bfut what has really happened here is 
not at variance with the Second Principle over a short period. The 
staff--once they have been sensitized by the external evaluator— are. 
now in a position to produce evaluative suggestions that are independent 
of him 0/ her and, at this stage, more useful to the project. I have 
seen this point corrupted into *he idea that "from ndW on we don't 
need external review." Of course, it will only be, a matter of months. 



* ERIC ^ . . 17 ' 



• 14 

at most a year, before the rigidffying effects of a constant social en- 
vironment are likely t.a lead to oversights that ayContinued app-li- 
catiOA-of the Second Principle would identify , *and so the introduction 
of some nej*; kind of independent feedback loop should be planned. 

A milder treatjnent is to switch to another external evaluator, - 
Such a move is sometimes called for by the Second Principle, It is- 

called for if the suggestions from the original, evaluator's next visit 

' '' . • 

• are (a) completely predictable, probably unfeasible or invalid, 

* . . 

althoagh (c^ the situation or data or staff have changed very signifi- 
cantly. Here we have the not iincommon phenomenon of rigidifi cation of 

the evaluator. That is one kind of loss of independence— the bias 

\ ... 

now being internal (to the evaluator) rather than external (e.g., due 
to economic advantag'es of a favorable judgment). On. the other hand, 
there are occasions when the repeated evaluation "i? as true as ever 
and the advice given is as sound, and the reasons for rejecting it as 
unsound. Then a switch to another competent evaluator will predictab'ly. 
pnoduce the same advice. To put it another way, there's nothing wrong 
with the .independence of* the present evaluator, a"nd the Second Prin- . 
ciple cannot be- invoked to justify a change. Loss of independence with 
time is a tendency-and does not necejj&arily occur in less^^tian fifty 
yearSi though it's likely to occur in less than one. ^ 

A Closer Look at Bias 

The quest for objectivity via' the criterion ef independence often 
leads to the use of ''external" evaluators in both the formative and 
summative situations. Now of course, externality is always relative. 



ERIC 



1^ 



Using someone from another department or school may be external^ 
enough for one's needs. But there are ties that bind across those — 
little gaps— ties of family^ friendship, political alliances, and even 
the sameness of professional cormiitment. It is. nearly "always possible 
to find important simi lari ties -and/or differences in the value^systems 
of any eval;ia±or anJ'anyievaluee. That is too often taken to be a* 



sign of. disqualifying bias. It is not. It is 'only a possible cause 
of such a bias, not proof of its presence. - ^ 

There is a crucial ambiguity in the concept of bias. It 
sometimes interpreted as a statistical Iv J ikely tendency to systematic 
error (against which nepptism rules are formulated) and sometimes as 
an actual and systematic increase 1n the i^requency of errors. The 
former is crucial in credibility considerations and the latter 
(narrower) concept in validity considerations. We* need to be clear 
that only the latter affects validity. The Second Principle does 
not guarantee increasing bias (in the second sense) only an increasing 
probability, of it. In legal and moraU as wall as scientific contexts, 
only the. second sense is relevarft (except when' politics is part of the. 
problem, which sometimes converts the issue iritO a credibility one). 
One 'way to put the point is to say that one can overcome bias in the 
second sense b;it hot in the first. If one's spouse is put on one's 
staff, one has become biased in the first sense;. one will "have, a tendency 
towards bias, in the second sense-^but one may be able ^tp tranjcerid 
it. Bias, in the first sense, is a statistical tendency in a group^ of 
which you are a member; irf the second sense it is a tendency which- has^ in 
fact infected you. Only in that sense is bias fatal to objectivity. 



19 



We have been stressing considerations of independence here, • 
because this represents a partial operationalization of the crucial 
concept of bias. Is the evalua tor's -opinion formed on the basis of 
the, rel event available evidence, independent of the irrelevant 
considerations such as friejfidship? That is the key question, and it . 
is not hard "to find evatuators who are highly independent of their 
social ties in this sense. Of course, for credibility purposes one 
will, have to avoid the extremes of Biapdtism, etc. And a refinement 
of the first Principle warns us to beware of regarding people who are 
physically separate as judgmental ly independent when they are .paid by 
the same hand and reiwarded for the allegecj success or actual continuance 
of the same project., , 

The Second Principle warns us to lodk at the diachronic dimension 
when checking independence^ and it interacts with the First Principle 
in various_ways-^- Suppose you do*^hire an outside firm for evaluating 
a. project, a f^/n whose headquart^^rs are in a distant state. This 
^ looks like real independence. ^BD^ask yourself wbat the reward 
system is foV that firm". It isn't any more rewarding for them if your 
project is successful or not^ £er se— and that's why you value their 
opinion, why they appear independent. But look a Ivttle- deeper, or 
longer . What is rewarding to them over tlie years? Success in their 
business, which of course requires a. continued flow of contracts-. 

■ ' f. 

Since such finns are very well aware of the "power of the grapevine - . 

. • !■' 

in getting further clients, they are bften:well aware that an evaluation 

• ' X ' . ^ 

which shows the client in a good" light is much more conducive to later 

contracts than a critical evaluation. The reverse side of this coin 



was brought home' to -me when ■communicating with a network of .evaluatoi^s 
on 3' USOE grant. 1 "haard more than one sad tale, of "blackbal ling" an 
evaluator who gave a deservedly critical evaluation. In short, the 
"independence" off an'external ^valuator can be seriously compromised 
by the constraints of business success,^ For a brilliant exposition 
of'^e same phenomenon in the world of CPA's, >see Bril off C1973). ^ 

- Think back to the example of student*" evaluation of teaching. The 
time sequence is crucial. If the effects of that evaluation on 
the teacher will occur before^the teacher evaluates the students, they 
have an incentive to give false positive evaluations. If the teacher, 
evaluates the students before the reverse occurs, they, have a "getffng 
even" motivation "for false negative evaluation,^ and the teacher' has s 
bribery motivatiorf for false positive evaluation of the students. It 
is possible to handle these problems, but it is usually done badly - 
because no one looks at the feedback Ibops. ^ 

An example oi" the way in which apparent independence i«i corrupted 
by professional ties can be seen in fflpst accreditation reports by teams 
visiting, e.g.. ,-^igh^ schools. The team coQiains, e.g., specialists in 
driver. education i who "site-visit" the driver education department and 
return with the judg.ment that driver education bleeds more support than 
.it's getting from the school administration. ^.^ - 

P ractical Implications ' , , " „ 

Four morals emerge of concern to us all, evalyators and evaluees 
alike. First, it is a serious management error tp pN)vide funds for 
external supnative evaluation to a project , since if the project 



18 

management contracts out evaluation. of their work^, the phenomenon 
just described will have the maximum effect, i.e., they wiji te^ncl to 

pick "friendly" evaluators or fix the RFP to eliminate sonle serious 

^ . 

sources of negative evaluation (Sesame Street's. contact*to LTS is 

' . • , ^, 

an example). Second^ where tHerfunding agency contracts out the- 

evaluation itself, thereby avoiding the preceding objection, ona is • 

still not entirely free of the pro'blem sinc£^ th^ agency's own decision 

to fund, the project "is indirectly under evaluation and/l^ce they too 

tend to want a favorable report, a fact which they qaickly' signal to 

the evaluator. Even where the project wasn't mUch favored by them, 

but imposed by Congress, the agency is often incapable of avoiding 

■ ego-involvement in it. USOE's suppression of a moderately critical 

a 

Title I evalitation is a well-known examj)le, and NSF has been ik^s^ed 
in a similar case (so has every human. institution, n^j doubt; the 
question is only ^whether serious efforts are made to minjmize the 
frequency of such occurrences). 

Now if an agency can't ask its projects to get the summative - 
evaluation done, and isn't above suspicion even when it hires the ■" 
evaluators* itself , what's left? Either a general -purpose evaluation 
office, like the general Accounti.ng Office which currently serves 
this function as well as the fiscal one (albeit rather incompetently * 
since their staff has little" ti^aining in-the ne^ole of general 
evaluation), or increased pressure from the ul^mate los.er (the tax- 
payer) via Congress to get the. ego-protection of agencies rated lower 
rather than getting objective Information to the public. Congress' 
tendency to Monday-morning quarterbacking-is a major catUse of this 
trouble. ^ 

22; 



There is a "next-best"- procedure, if neither .of the preceding ^ 
twGf* suggestions can be immediately effectuated. It is quite natural 

for an agancy that contracts independently for j^ts evaluation p 

■ -• ^ • ' 

to use the same liaison officer for the evaluation contract as 
the project contract. ' This is a fata^ mistake. Vhere must be at_ 
least "separate individuals involved, even if not Separate divisions 
of the agency. The reason is simple. The norma] type of pressure 
on the liaison officer,- discussed earlier, rapidly converts him into 
an advocate of the project back at the agency. Indeed, it is entirel^ji 
appropriate Aat he should fulfill this role,' since there's usually 
nobody else to do it after the initial recommendation comes in for 
the rev.iew committee {which can be considered an advocate of the 
project in some remote' sense) . The problem is^that if this project 
vis also handling the evaluation project, the advocacy will lead to 
pressures' on the evaluation contractors to 'soften their report, or is 
likely „to' lead to these pressures, ^'n a way that simply reduces t^ 
independence of the feedback to the age'ncy and the admf ni s tr^or . : 

♦ 

This has now happened too often for it to be ignored any .longer. 

. One can see the sequence of sophistication in terms of- th^ 
following steps in an imaginary history of evaluation arrangements^. 
The fiVst step consisted in asking the project to be sure to do an 
evaluation of itself. The second step consi/ted in asking' the project 
^0 use an adv^isory committee of external experts to help it do an 
evaluation of itself. The next step consisted in requiring that it 
devote specific monies to evaluation; both of the previous steps^, 
apart from bias, suffered from the fact that overruns were takefi 

23 - 




out of the hide of the evaluation. ,But this step still meant that 

the project- -even- if they- appointed a subgroup of their sjjff tO; the 

summative evaluation task— was evaluating itself. The next. step 

consisted in requiring that the project sub-contract the evaluation, • 

THis still 1-eft open the "control" of setting up the design part of 

the RFP in such ? way as to Exclude appropriate-crittcism and 

, • ■ ^ • *• 

sfelecting sub-contractors partly (and perhaps unconrciously)' because' 

of probable favorable tendenq^e^ . * The next seep was to have the 

agency sub-contract the evaluation. 'This H unsatisfactory for the 

reasons we have just described. The' best anjangement is to have a 

separate agency in charge^of evaluations^, :ertainly cooperating with 

evaluation staff and, liaison officers of both the project and the 

agency that's funding the project: or at least .a sub-agency. 

f " ' " '■ 

Third moral : if projects cannot self-evaluaj:e objectively and 
if the commercial evaluators ^re open to biases just mentioned and 
if the changes^^ just men JfSI?^. have not yet occurred, it looks as if 
one will not be able to find good evaluators. There are two routes 
to go. The big shops like ETS> RAND, and SDC do have a degree of 
independence of any particular] agency or officer and can afford to 
choose independence over back-scratching, at least part of the 
time; and they do^'have strong professional status needs as well as ■ 
economic ones. The other route is exemplified by Briloff (1973) 
in the accounting field, i.e., by. someone who has a permanent full- 
time fall^back joB which provides a perfectly acceptable -alternative 
to contract work and-flne that is positively preferajjle to compromised 
contracting. It is not possible to conclude that the middle-size 
full-time shops are in fact less r-eliable, but it harder" for them 



to ignore illicit pressures. There are important trade-off advantages 
for the|j» however— effic>ently manageable size, avaNiUbility of 
university resources,, flexibility of procedures, etc. Since the only 
real test of bias is error, and since some of th^se shops do run with 
a low error-rate, a consumer who is famil iar with the track records 
might well pick a good midi-shop over the part-timer whose resources 
are limited or the big shop where thei^ is considerable variability 
in staff quality. Nevertheless, we coufd doi^with some evaluators 
who are as beyond suspicion' as organizational' arrangements can make 
them. One might argue that Alan Post, the noi^-partisan Legislative 
Analy'st'for the state government in California, is one paradigm and 

0 

the Supreme Court another. I have suggested to.NIE that they should 
consider reviving a version of NIH's Life Research Fellowship program 
for this purpose. 

The fourth tnpral is that since the arguments under the third 
point bear closely on the present author's own role as an evaluator, 
they should be viewed with exceptional suspicion. Indeed, this is an 
essay on suspicion, since witi^hout it one cannot avoid serious ■ 
cpntamination'. But it is not\an e^ay on the virtues of suspinon 
in itself.' All suspiciorf can legitimately do is ^ggest possibilities, 
against wh.ich one takes stjitable but not a^urd pirecautions , and the 
truth of which one subseqiJently investigates. 

Negative Reactions to Bias Control Procedures , 

' . ^ • *' \ 

Given our cultural emphases, these systems of independent eval- 

' , . \ 

uatidR^are likely to Strike us as symptoms' of distrust. Given a 'serious 



commitment to effective Service, responsibility, or self-improvement, 
they would instead he seen'as useful--or rather, essential—aids. 
Since it is a universal truth that ^elf-evaluation is unlikely to be ^ 
reliable, it is a necessary consequence of interest in truth that on 
supplement self-evaluation. Hence anyone interested io improving 
his or her own performance -Tiust arrange fT)r or endorse some kind of . 
independent evaluation. Thinking about my own teaching or my own 
performance as an evaluator, I know that I need independent assess- 
ment of it, andvl arrange it whenever funds can be obtained (which 
is e^ssentially always, i^f one really tries). I use such feedback ' 
inyself in the fbrmative mode (when experimenting with alternative 
approaches.) and expect it to be used by others summatiyely, that is, 
for judgment of my 4)erfonnance by my' superiors or (Tlients, It seems 
to me that a mfssing major goal in schools of education, and [irobably. 
in all tertiary if not secondary, education, is the affective goal ^f 
valuing justified criticism (whicl^'is not, of course, the same as 
enjoying it). ^^'^ 

Long experience vith lazy. or corrupt supervisors in bureau- \ 
cracies of all kinds makes it obvious' that potentially effective systems 
of evaluation are open to all kinds of abuse and neglect. But the 
conwfion labor-union (or prof^^5sorial ) response of refusal to participate 
in any such system is even less responsible since it rrejects a legiti- 
mate demaad instead of rejecting illegitimate abuses. A serious loss of 
credibiility with the parent, voter, and/or taxpayer is a natural and 
appropriate result. Refusal to participate is, however^ justifiable 
if either of two considerations applies: first, that the- proposed sys- 



2 b' 



23 ' 

tern is technically seriously inferior to another feasible and specifiable 
system with regard to which cooperation would be forthcoming (the * 
inferiority to -be^udged by independent expert evalitatars) , or second, 
that ,a respectable system of independent- mutual evaluation (of the ■« 
adm'ini strati ve staff who commission or will conduct the evaluation) is 
simultaneously or earlier introduced. It should be noted that ."techni- 
cally inferior", is not contrasted with "morally 'inferior" (i.e., more 
likely to produce injustice) since tt is a technical requirement that 
t'he system minimize injustice. The contrast is with "impressionistically 
inferior,"' i.e., inferior in the v^'ew of unskilled personnel who rjeact 
largely to perceptions of risks for them. A good evaluation system 
nearly always has to involve some moral elements, and its moral status 
requires it to weight the welfare of all people that it affects pro- * 
portionately to their stake in the issue. That means it must weigh the 
rejection of outstanding job applicants in the balance;^gainst the re- 
tention of weak teachers, using the gains and losse^fpr students. and 
others affected (parents, employers) as additional currency. Morally 
speaking, too, it is outrageous that most educational systems which use 
administrators to evaluate teachers have nothing worthy of the name in 
the way of procedures for evaluating the administrators. 

Efficiency, narrowly conceived, is not the only concern of evalua- 
tion systems. Indeed, it is entirely secondary to justice. And the 
cardinal principle of justice is that evaluators should be evaluated, a- 
theme previously stressed but that deserves further explicit discussion 
in the following section. Its practical basis lies in that it folloif/s 
directly from both principles already enunciated. The infrequency of 



4 



27 



its a^pp^^catjon is an illust ratio n that evaluators are not much more 
attracted by*tough self-evaluation than are their evaluees. 

V 

Metaevaluation , 

.. I have used this term to refer to the evaluation of evaluations 
or evaluators. Thomas Cook, in the most detailed study made of it so 
far, calls it— or a special case of it—secondary, evaluation (Cook, 1974, 
pp. 155-222). Jim Sanders, in the only essay that I know of by another • 
author oh the topic, follows my usage (Sanders, 1973).* The term "sec- 
ondary evaluation" suggests to me evaluation using secondary indicators, 
such as teaching style, instead of primary ones," such as learning gains. 
The term "metaevaluation" makes some sense to someone use'd to the aca- 
demic terminology (metamathematics , metaphysics, metaphilosophy, meta- 
science, metapsychology, metaethics) but is for others an opaque 
neologism for which I apologize. In a sense this who'le paper is a study 
in the methodology of metaevaluation. I will stress here .a couple of 
particularly crucial points about what I would regard as. standard 
operating procedures. The first arises from the* requi rement that eval- 
uators should try to arrange that their own work be replicated , in whole 
or part, by other equally competent evaluators working independently. 

This is particularly appropriate wliere an^ non-standard methodology is 

'1 , ■ ■ _ 

involved or where particularly difficult synthesizing judgments "of over- 

al 1 meri t are^ inVol ve^". Wen thi s "ccpproach i s used^ i t. shoul d not con- 

clude with the submission of the independent reports. Each evaluator 

or evaluation- team should, after such submission, now critique. the 

s 

report of the other team and have the opportunity to submit a revised 
(*See also Stufflebeam [1975].) " 



25 



^valuation report Involving such modifications as seem called for after 
reading the other report. In certain cases a combined report may be 
agreed upon, after a joint "convergence" meeting, a procedure Stuff le- 
beam has encouraged. 

A useful special case of the pre(\ed1ng approach is the adver- 
sary arrangement, where one evaluator o^^ team deliberately undertakes 
the task of maKing the very best possible case for the project, given 
the data, while another presents the case against . This was admirably- ^ 
done (on a micro-budget) in the TCITY evaluatiioTi by Stake and Denny 
(Stake and Gjerde\ 1971, pp. 26-27;14). It caused trouble because 
defenders of the project felt it legitimated the negative comment. One 
would do better to discuss this mode of reporting With the evaluees and. 
clients in advance to avoid unnecessary defensive reactions like this. 
Robert Wolf has recently extended this approach Into the "legal model" 
of evaluation (Wolf, 1973). 

The metaevaluatqons thus generated^s"eactr team criticizes the 
other's evaluation) are very useful for the administrator-client. For 
they are the comments pf two highly knowledgeable parties \n^th a 
-reputation on the line. Arranging a design that puts this kind of 
leverage on the^evaluators is the moral equivalent of the pressure that 
the presence (or prospective presence)' of tan evaluator places ory an 
evaluee, which has a certain natural justice; but it also provides, 
pragmatically speaking, a very substantial incentive for doing pne's 
Dest. Goal-free evaluation, which I'll discuss In a moment, is a 

s 

natural extension of this type of proce<3ure.^ 

The "double- teaming pW)cedures just described, besides their im- 
plicit recognition of the truth' of the adage about sauce for the goose 



26 



being siuce for th^ gander, are staps towards a scientific approach 
to evaluatibn in that they yield some data for. calculating reliabilities. 
The approach applies equally well to the evaluation of proposals or 
personnel by panel s/coirmittees ; indeed, «it is a scandal that the big 
foundations, who dispense most of their funds through tl^eir peer-re- 
view panel procedure, do not investigate the* reliability of such panels, 
especially since there are a number of different ways in which panel 

reviews can be conducted, with the resulting probability of significantly 

* ■ .' * 

different raVikings, v ^^ 

The second suggestion I would stress is using the evaluees as meta- 
evaluators. That is, the preliminary report from the evaluators should 
be made available to the evaluees for critical^ comment, and that comment- 
in raw form, or synthesized in a way acceptable to them--should go fo\;^ 
ward to the client along with the evaluator's original report and any * ' 
modifications that the evaluator feels are appropriate in the light of 
this feedback. Guarantees* that this uncensured response will be attached 
to'the evaluation report^Will often have a favorable impact on openness 
to the evaluator at early stages. ^ 

The two preceding'^suggestions m-^'ght also be taken as Items for in- 

clusi(5n in a handbook of professional ethics. There^are others besides 

> 

the evaluees who might well be consulted as metaevaluatqrs, for example, 
those whose resources are being used for the programs being evaluated. 
This proposal for "representation of the affected who are ndt involved"* 
has a rather general application and ^essentially zero recognition. Ho 
many' school -boar^ members are representatives of. che childless conmiunity 
on whom the tax burden falls, without any obvious returns? How. many of 

~ { 



30 



27 



the advisory panels for, say the National Park Servic?, include 
representatives of .those who do not use the parks—but pay for them- 
almost as heavily and might . be interested in using them if their 
interests werq provided for? Moreover, evaluators should look around 
Carefully for people with special knowledge and interest in whatever 
is being evaluated, even if they do not qualify under the second 
suggestion above, that is, as evaluees. 

• ' '• . ' 

Methodological Approaches to Bias Reduction 

The reduction of bias in the sciences is normally; .achieved by the 
rep^ement of judgmental procedures by mensuration and calculation. 
To a considerable extent the same path can be followed in evaluation. 
In fact, the "calculations"--1n this case, the statistics— are already 
f)retty sophisticated,, although their selectioi^ and interpratation still 
requires a^good deal of judgment. Evei^ there, the choice and signifi- 
cance of different statistics has been greatly standardized in recent 
years with increasing sophistication and advanced training. The problem 
is mainly with the qualitative framework of an evaluation, especially 
the elements in it that generate the value component of the conclusion. 
This means particularly the~rieeds assessment^ the comparative dimensions 
and the costing. 

I shall confine myself to a mention of four approaches that seem 
to me capable of having considerable-effect in upgrading the objectivity 

r 

of evaluation. First, there is'the standardization or routinization Of 
qualitative aspects of the proc'edures. A detailed study of scores of 
evaluations done during the last s^x years suggests that a great many of 



28 



them. (over 90%. at a guess) omit one or more considerations that afe 
obviously relevant to the assessment .of merit, they are allegedly pro- 
viding. The reasons for the oinission are often ego-defensive poli- 
tical. (For example, the failure to look at the comparative performance 
0^ critical competitors, essential if evaluation is to servke purchase 
decisions and hard to avoid when responsible refunding is being .con- 
sidered.) But th'eyVe also often simple errors of oversight. Both 
kffTds of omission can \e reduced by using a standardized checklist 
approach, and I have beeV encouragfid by the extent to whicfi a suggested 
version of siich a checklist, was adopted in its firs^.^ear after private 
"Circulation (Scrlven. 1974.\p. 35.93)^t^- orientation of that IS-point 
che</lcli!;t and profile generato\is towards'pay-off evaluation. One - 

developed by Maurice Eash and Ep\e (Eash, 1969. pp. 18-24) is aimed 
more towards systeitiatic product description and is naturally consider- 
ably more popular amongst producers. .Both have legitimate uses., and 
both can no doubt bfe improved. ETS also has one with some special 
features (mine origir,ated in some work with ETS on a product review .con- 
' tract), Some others have been proposed for special purposes, e.g.. the • 
CMAS (Curriculum. Materials Analysis System) frpm SSEC. and the tremen- 
dously valuable checklist covering all the administrative aspects of 
an evaluation developed by Dan Stufflebeam (1974). The trend is there 
and. given support.' can lead to very substantial upgrading of evaluat^'on. 
especiany of evaluations that should be fairly straightforward, but 
that oftS get bo"gged down in irrelevar.:ies . or omit relevancies. 

As an. example of an irrelevancy, one sometimes hears the lament that 
we'ca^t't really evaluate educational products until we have, an -adequate 



' 29 

theory of learning. This remark displays a total lack of understanding 
of the difference between evaluation and explanation. One needs great • 
professional skill as a product evaluator to set up a valid assessment 
of color TV sets, but one needs to know nothing about electronics. On 
the other hand, to explain whji_ a particular set triumphed in the 
ratings will require such knowledge— in fact, an extremely rare com- 
bination 'of t(ieory, design," and production engineering skills. Theory 
may suggest breakthroughs in design; its contrlbutign to evaluation is 
at most that of supporting the use of certain secondary indicators as 
criteria for merit. Even there one needs only empirical correlations^ • 
of those features with favorable evaluations. The\checklist, like the 
trouble-shooting cHart in the back of an appliance handbook, incorporates 
a massive amount of knowledge in- a maximally task- or 1 en-ted i'orm; theories 
have the first, but n»t^ the second [iroperty. 

But the improvement of evaluation is not the only pay-off from the 
' checklist approach. I believe it has already produced significant 
improvements in products, for the producer Is not only aware that the 
checklist may be~in some cases, will be—used in evaluating the pro- 
duct, and hence" tries to meet the standards it expresses, but he or she 
is also (to a variable -extent) interested in turning out- a quality product 
arid- may find the arguments supporting the checklist persuasive in up- 
grading his or her conception of what .that Implies. 

The second approach involves upgrading the training procedures for 
^va^at& rs, e sp e d aHy^HT-^e-qua^'l itativG di inens49n^ — The-s4 mpl e st mov e — 
would be to increase enormously the number of evaluations performed 
during the training^eriod, perhaps to a hundred orjmore-r'With feedback 



30 . ■ ^ 

In one form or another (such as tailored comment, progr^ed materials, 
oj; the Issue of goad and bad paradigm answers). 'Another procedure, 
which could be applied In modified form to the training of review panel 
members. Involves a direct" effort to achieve high Inter-judge rel1a- - 
bility without Introducing correlated error, a procedure that I call 
calibration .' This Is an extension of the first procedQre and Involves 
using-a basic set of cases,. judging them Independently , -talking out 
differences as far as possible, testl'ng on a new set, and so forth until 

... I 

reasonable convergence^is obtained. ^jj^. 

The third approach picks up where training leaves off, but focuses 
on the elimination of sources of bias external to the evaluator. We have 
already discussed some of .these that arisls from organizational and economic 
factors, the need for further contracts, for instance. We have also dis^ 
cussed interpersonal ties and argued for-the use of external evaluators, 
af least in a supplementary role, for bo1^h formative and summative evalua- 
tion. Even when we had taken account of all the preceding sugges-tions, 
a type of bias,ing interaction occurs which has highly significant effects 
on the evaluator and needs to be dealt with. It has two dimensions 
.which are., roughly speaking, affective and cognitive. 

The affective influence occurs because of the generally submissive- 
obsequious hanging-on-every-word posture which it is difficult for an 
evaluee to avoid adopting towards the evaluator, especially. if the latter 

■1 

is evaluating on behalf of the funding agency. This is somewhat too ego-. 



gratifying for evaluators to suppose that it has no influence on them. 
"How can all thes£ ni^f. intelligent people who show their good taste by 
asking after my health find work so interestedly (and e^en, in formative 
situations, by selecting and paying me to do the evaluation), possibly 



I 

9 



31 

t 

not be do'lng something truly worthwhile?" The best way to minimize 
this Influence 1« by minimizing the social contact With the evaluee 
prior to submission of, the preliminary version of -the report: There 
is plenty of time for it later, during the interaction about the report, 
-and then it is far less time-wasting for project staff. (Site-visit 

aTuat+«ii9 always have a disruption cost going againsV^heir utility. ) 
Ir Treacting^o the dra^t evaluation, the evalu^es have^focus for 
their activities and remarks, and the evaluator has a stake in the \ 
.. discussion so that a frui^tful exchange caV^cur "rStirfey than a '^show 
•<S>» and tell" performance. 

' If one eliminates these prior social exchanges, how does the . 
evaluator get briefed about the background, aims, and nature of the 
project? This question lea<^ us- to look at the cognitive biases that 
resi;lt from such a briefing. If one wants an unbiased view of what 
tbe project does, one would do better to talk to or, better, observe 
the users , not the producers . After all, whether formative or summa- 
tive, a major func\i on of evaluation is to look at the^ materials f ronr- ■ 
the point of view 'of a prospective user. The user will not get a 
"' visiting fireman treatment. The uiser will not be concerned. with : 
background of the product or what it was meant .to do, only with what 
it actually does. Sp the evaluator, in syiula^ihg the user's view- 
point, does best to avoid all the "f ring/ benefits." 
> Taking these considerations serj/fusly leads one into doing goal- 

■ , fr«"» "at inn ( RFF)-' ji; pxtrpmely i mportan t a s a methadoloflX-loiL, 



ERIC 



avoidtflg^dverfavorabje evaluations and for detecting si^^e-effects. 
Since one ha^ not been^W^^at the intended effects— goal s--are, orie ^ 
works' very hard toVliscover anj^_ effects, without the tunnel vision 

35 



32 ' . 

Induced by a briefing about goals. If GFE sometimes errs In the 

direction o1^ being too critical \)r missing a main effect, the cost 

of those errors is insignificant because th ■> ■ -n be picked up at 

the debriefing . Putting it another^ way, th^ 'iUe is the best ' t 

way to begin an evaluation because it is reversible without loss» 

whereas the 6BE (goal-based) mode is not reversjb-le and more likely 

to be biased. 

One might describe GFE as a step beyond double-blind methodology. 
(Some of its critics would probably prefeV to call it totall y blind.) 
In double-blind drug studies, neither patient nor nurse and/or investi- 
gator knows which pill is the placebo and wl* is the experimental 
drug during the period of observation (which is when the bias would . 
operate). The Interest is to get the investigator to look just as 
carefully at all patients, without the kind«of prejudice that might 
lead to projecting effects onto the group that got the experimental • 
drug. And, of co(|fse, to ensure that the "treartment," which Involves 
both a pill and its presentation, ' is equalized. ^ The evidence about 
the effect of expectations on perceptions is so strong that an experi- 
mental design that does not blind' the observer-investigator simply 
could hot be tifken seriously^ In triple-Llind, the investigator— who ^ 
would now Tiave to be different from the developer— would^ also not know, 
wha^L the intended effect was. He or she would have to discover what 
■ effect* if any, the administered substances had, from a, study of 
"patients' heal th , etc . , through" Wperlod drug aamimsti^ation, and ^ 
thereafter. 

' rjow what possible point could there be in such a procedure? Very 

simpl^: it will 'make the observer-evaluator struggle hardtoNfind any 

■ /, . - ■ . •» 

• ■ ' ■ V ■ ■ , . . \ .. ■ 

36 ■ ■ ■ . ■ 



33 

and all effects, without prejudice, since his or her reputation is on 
the line, and the job has not been pre-defined. Reading a non-existent 
effect Into, the clinical picture, cued by inspiring messages from the 
research crew, is made less easy; missing a slight but crucial side- 
effect Is made more difficult. Of course, the evaluator has access 

'i 

to trte charts and medical history of each patient and* it will often 
be easy to get an idea of the Intended effect from these. But to make 
that idea pre'cise, to describe the class of patients for which the 
yeffect appears to be such-and-such, especially given the absence of^, 
cues as to which received a placebo, will put the investigator on' 
his or her mettle.,, ' v - 

u ' ■ . 

In the medical situation^ the intended effects are relatively 
simple; the class of patients treated is a rather good indicator of the 
ITitended effect, and the consequences of reading non-existent effects , 
into the>gata are considerably (but not entirely) mitigated by the * 
double-blind situation. In educati-on none of these considerations 
normally hold, the latte^ failing since double-blind studies are not h 

generally possible. Consequently, the advantages of goal-based ^ 

\ 

evaluation are particularly crucial there, whereas they may be only 

marginal in Inedical research. Apart 'from the methodological advantages 

■ ^- \ 

of making the evaluatoV* hunt for an/ effects and thereby reducing 
the change of missing side-effect, GFE provides yet another of the 
procedures for exerting iiccountabllity pressures on the evaluator 

in aHriitinn _t0-those. mentinnftd In the , s ertinri nn metflev a l n at ion , a nd 

hence restricting the pfay of bias. There are more detaUed discussions 
of 6FE in House (1973) and' Popham (1974). ' V'^"' 



34 . 

Finally, 1t 1s well worth mentioning the advocate team approach 
for generating alternative M'ans, which calVi then be comparatively 
evaluated. This has be6n particularly carefully studied and developed 
by Dan Stufflebeam's staff, especially by Diane Reinhard (1973) who 
applied the emphasis on Independence stressed earlier In talking about 
feedback channels to input. One notices a deficiency in this dimension 

— c 

of evaluation not onV where complex plans are involved (the area where 
adversary methodology has been focused) but also in simple product 
'evaluation where some Ingenuity may be required to identifjf the approp- 
riate alternatives. For example, the evaluation of CAI (computer 
assistecl instruction) sHould normally Involve comparison with programmed 
texts using the program content from the computer, since thesft can be 
produced for a minute fraction of the CAI costs, are portable, and 
iimuUaneou^y usable by many' students. 

Conclusions " ^ . ' 

An effort has been made to , re view a wide range of sources of bias 
in evaluation, and preventative measures for them... .The resulting 

'fecBhmendatlons, taken in toto, provide a fairly comprehensive set of 
gutielines for setting up the broad outlines of an^evaluation system. . 
Two normative principles were formulated, the first recommending 

^'ndependent feedback in evaluation, the second requiring regular review 
bf th'^ Independence. A third principle is inherent in much of the later 

discussions oT practical procedureTr'i t ass e rts thdt the^b'est guarantees 
of independence- are ignorance and countervailing bias. There are no 
wholly unbiased^valuators but there are arrangements which discourage 
them from bringing (some of their most damaging) biases to bear, or_ 



38 



35 / 

Where their biases are (at least partially) balanced off. The search 
for the pure In heart is more appropriate for mythology than i^thodology. 
We can arrange for Jurisprudence when wej:an't find It; It can be a 
property of a group of evaluators, even when It is a property of none 
of them. It's a mattV of balancing off, not perfect stability. We 
could call this the Principle of Independence as Dynamic Equilibrium, 

I 

following our practice of grand titles for grim truths. When we want* 
valid Independent evaluation, we don't use the driver-educator to evaluate 
the driver-educator, but w6 use one driver-eductor and one Latlnl^t, or 
both In one, and that's better even than an accountancy Instructor (the 
implications for evaluating ethnic studies programs are obvious and 
possibly more ^excit-ing). To evaluate breeder reactors we use someone 
from the Sierra Club an_d a member of Congress, not a retired* juSge if 
the Supreme Court. (When we're conterned with credibility rather than 
validity, we pick the judge and Require that the judge hear the others 
and that a summary of tbeir" briefs be attached to the evaluation report.) 
Or, to evaluate a new drug', we Use researchers who aren't told what the 
drug is supposed to do. In short, fiflht fire with fire or with Oxyger\^ 
starvation, not by trying to make everything Obt of incombustible materials, 

The Principles tell us that independence is essential, impermanent, 
and situational. Of course, one might say, we all knew that. But then 
why didn't we value the knowledge epough to use it? Perhaps because we 
also knew, or thought we knew, the opposite; that independent advice is 
^-^hixuryr-or^' Jt it ca n b e p rovid e d -by-a-profier-or§aM«tiona 1 arrang er 
ment'of supervisors, or that it can only be obtained from really -dis- 
interested people. Knowing, contradictory truisms about bias and its 
control is knowing nothing about it. / 

39 



References 



Apple, Michael W., Subkovlak, Michael J., and Lufler, Henry S., Jr. 
(Eds.). Education evaluation: Analysis and rasponslblllty . 
Berk^l ey , California: McCutchan Publishing Co., 1974. 

Briloff, Abraham. U naccountable accounting . New York: Harper and 
Row, 1973. 

Cook, Thomas. The potential and limitations of secondary evaluation. 
In Michael W. Apple, Michael J. Subkovlak, and Henry S. Lufler, 
Jr. (Eds.), Education evaluation: Anal.sis and responsibility . 
Berkeley, Cal.1forn1a: McCutchan Publisnlng Co., 1974. 

Eash, Maurice J. Assessing curriculum materials: A preliminary 
Instrument. Educational product report. February 1969, 2 
(5), pp. 18-2?^ 

House, Ernest R. (Ed.). School evaluation; tttfe pol'.tics and process . 
Berkeley, Calif ornial McCutchan Publishing Co. , 1973. 

Popham, James W. (Ed.). Evaluation in education: Current application . 
Berkeley, California: McCutchan Publishing Co., 1974^. 

Reinhard, Diane L. Methodology development for input evaluation using 
advocate and design teams. Unpublished doctoral dissertation, Ohio 
State University, 1972. 

Sanders, James. Untitled materials prepared for Instruction "at 
Indiana University, Bloomington, Ind. , 1973. 

Scriven, Michael. Evaluation perspectives and procedures. In James 
W. Popham (Ed.), Evaluation in education: Current application . 
Berkeley, California: McCutchan Publishing Co., 1974. 

Stake, Robert and Gjerde, Craig. An evaluation of TCITY: The twin 
city institute for talented youth . University of Illinois: 
the Center for Instructional Research and Curriculum Evaluation, 
1971. 

Stufflebeam, Daniel L. Meta-evaluation. The Evaluation Center Occasional 
^ ^aper Series , 2. Western Michigan University, May 1975. 

Wolf, Robert L. The Application of select legal concepts to education 
evaluation. Uffpublished doctoral dissertation. University of 
Illinois, 197^. 



40 



s 



I 



Additional copies of papers 1n this series are available upon 

request for a fee of $2.00. Previous papers 1n this series are 

listed below. Please direct your Inquiries to 

Mary Anne Bunda, Series Editor 
The Evaluation Center 
College of Education 
Western Michigan University 
Kalamazoo, Michigan 49008 

1 . A Response to the Michigan Education Department's Defense 
of Their Accountability System by Daniel L. Stufflebeam. 

2. An AcbTilnl^trator' s Perspective of Evaluation by Jack P. 
Taylor. , 

3. Meta-Evaluatlon by Daniel L. Stufflebeam"]' 

4. Evaluation Bias and Its ControV by Michael Scrlven. 

5. Program Evaluation, Particularly Responsive Evaluation by 
Robert E. Stake. 

5. A Basis for Determining the Adequacy of Evaluation Designs 
by James R. .Sanders and Dean H. Nafziger. ~ 

7. Needed: Instruments as Good as Our Eyes by Henry M. Brickell. 

8. Assessing the Impact of Planned Social Change by Donald T. 
Campbel 1 ' 

9. A Checklist for Evaluating Larqe-Scale Assessment Programs 
by Lorrie A. Shepard. 

10. Standards artd Criteria by Geni V Glass. 

11. > Performance Standards by Nancy W. Burton. 



41 



ERIC 



